Feature extraction strategies in deep learning based acoustic event detection
نویسندگان
چکیده
Non-speech acoustic events are significantly different between them, and usually require access to detail rich features. That is why directly modeling a real spectrogram can provide a significant advantage, instead of using predefined features that usually compress and downsample detail as typically done in speech recognition. This paper focuses on the importance of feature extraction for deep learning based acoustic event detection, and more specifically on exploiting local spectro-temporal features of sounds. We do this in two ways: (1) outside the model, using multiple resolution spectrogram simultaneously based on the fact that there is a time-frequency detail trade-off that depends on the resolution with which a spectrogram is computed (e.g. ‘steps’ would require a finer time resolution, while sounds that span many frequencies require finer frequency detail); and (2), with a model that implicitly exploits locality, convolutional neural networks, which are a state-of-the-art 2D feature extraction model. An experimental evaluation shows that the presented approaches outperform state-of-the-art deep learning baseline with a noticeable gain in the CNN case, and provides insights regarding CNN-based spectrogram characterization.
منابع مشابه
Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)
Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...
متن کاملDNN Transfer Learning Based Non-Linear Feature Extraction for Acoustic Event Classification
Recent acoustic event classification research has focused on training suitable filters to represent acoustic events. However, due to limited availability of target event databases and linearity of conventional filters, there is still room for improving performance. By exploiting the non-linear modeling of deep neural networks (DNNs) and their ability to learn beyond pre-trained environments, th...
متن کاملLearning multi-labeled bioacoustic samples with an unsupervised feature learning approach
Multi-label Bird Species Classification competition provides an excellent opportunity to analyze the effectiveness of acoustic processing and mutlilabel learning. We propose an unsupervised feature extraction and generation approach based on latest advances in deep neural network learning, which can be applied generically to acoustic data. With state-of-the-art approaches from multilabel learni...
متن کاملRobust Features in Deep-Learning-Based Speech Recognition
Recent progress in deep learning has revolutionized speech recognition research, with Deep Neural Networks (DNNs) becoming the new state of the art for acoustic modeling. DNNs offer significantly lower speech recognition error rates compared to those provided by the previously used Gaussian Mixture Models (GMMs). Unfortunately, DNNs are data sensitive, and unseen data conditions can deteriorate...
متن کاملSpeaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis
In this paper, we investigate the effectiveness of speaker adaptation for various essential components in deep neural network based speech synthesis, including acoustic models, acoustic feature extraction, and post-filters. In general, a speaker adaptation technique, e.g., maximum likelihood linear regression (MLLR) for HMMs or learning hidden unit contributions (LHUC) for DNNs, is applied to a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015